svi algorithm
A Filtering Approach to Stochastic Variational Inference
Stochastic variational inference (SVI) uses stochastic optimization to scale up Bayesian computation to massive data. We present an alternative perspective on SVI as approximate parallel coordinate ascent. SVI trades-off bias and variance to step close to the unknown true coordinate optimum given by batch variational Bayes (VB). We define a model to automate this process.
A Filtering Approach to Stochastic Variational Inference
Stochastic variational inference (SVI) uses stochastic optimization to scale up Bayesian computation to massive data. We present an alternative perspective on SVI as approximate parallel coordinate ascent. SVI trades-off bias and variance to step close to the unknown true coordinate optimum given by batch variational Bayes (VB). We define a model to automate this process. As a consequence of this construction, we update the variational parameters using Bayes rule, rather than a hand-crafted optimization schedule.
A Filtering Approach to Stochastic Variational Inference
Stochastic variational inference (SVI) uses stochastic optimization to scale up Bayesian computation to massive data. We present an alternative perspective on SVI as approximate parallel coordinate ascent. SVI trades-off bias and variance to step close to the unknown true coordinate optimum given by batch variational Bayes (VB). We define a model to automate this process. As a consequence of this construction, we update the variational parameters using Bayes rule, rather than a hand-crafted optimization schedule.
ADMM-based Networked Stochastic Variational Inference
Owing to the recent advances in "Big Data" modeling and prediction tasks, variational Bayesian estimation has gained popularity due to their ability to provide exact solutions to approximate posteriors. One key technique for approximate inference is stochastic variational inference (SVI). SVI poses variational inference as a stochastic optimization problem and solves it iteratively using noisy gradient estimates. It aims to handle massive data for predictive and classification tasks by applying complex Bayesian models that have observed as well as latent variables. This paper aims to decentralize it allowing parallel computation, secure learning and robustness benefits. We use Alternating Direction Method of Multipliers in a top-down setting to develop a distributed SVI algorithm such that independent learners running inference algorithms only require sharing the estimated model parameters instead of their private datasets. Our work extends the distributed SVI-ADMM algorithm that we first propose, to an ADMM-based networked SVI algorithm in which not only are the learners working distributively but they share information according to rules of a graph by which they form a network. This kind of work lies under the umbrella of `deep learning over networks' and we verify our algorithm for a topic-modeling problem for corpus of Wikipedia articles. We illustrate the results on latent Dirichlet allocation (LDA) topic model in large document classification, compare performance with the centralized algorithm, and use numerical experiments to corroborate the analytical results.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)
Stochastic Collapsed Variational Inference for Hidden Markov Models
Hidden Markov models (HMMs) [1] are popular probabilistic models for modelling sequential data in a variety of fields including natural language processing, speech recognition, weather forecasting, financial prediction and bioinformatics. However, their traditional inference methods such as vari-ational inference (VI) [2] and Markov chain Monte Carlo (MCMC) [3] are not readily scalable to large datasets. For example, one dataset in our experiment consists of 100 million observations. An important milestone for scaling VI was made by Hoffman et al. [4], who proposed stochastic VI (SVI) that computes cheap gradients based on minibatches of data, updating the model parameters before a complete pass of the full dataset. A recent scalable and more accurate algorithm was proposed by Foulds et al. [5], who applied such stochastic optimization to the collapsed latent Dirichlet allocation (LDA) [6], and their stochastic collapsed variational inference (SCVI) algorithm has been successful in large scale topic modelling.
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > New Jersey > Hudson County > Secaucus (0.04)
- (6 more...)